Svdquant huggingface checkpoint export support #754

sychen52 · 2026-01-09T05:16:14Z

What does this PR do?

Type of change: new feature

Overview:

Usage

cd ./examples/llm_ptq/
python hf_ptq.py \
    --pyt_ckpt_path Qwen/Qwen3-4B \
    --export_path /home/scratch.shiychen_coreai/quantized_models/Qwen3-4B-svdq \
    --qformat nvfp4_awq_svdquant --kv_cache_qformat none --sparsity_fmt dense --calib_size 8

Testing

exported checkpoint and loaded.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added nvfp4_svdquant as a new quantization format option for LLM model quantization workflows.
Limitations
- Multi-GPU export configurations using tensor or pipeline parallelism are not supported with nvfp4_svdquant quantization.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

codecov · 2026-01-09T05:32:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.23%. Comparing base (c1956b8) to head (dc02325).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #754   +/-   ##
=======================================
  Coverage   74.22%   74.23%           
=======================================
  Files         192      192           
  Lines       19035    19038    +3     
=======================================
+ Hits        14129    14132    +3     
  Misses       4906     4906

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jingyu-ml

LGTM overall, including the approach for fusing the QKV and FFN layers. The current resmooth + refusion process means the resulting model is not exactly identical to the original, but this appears to be the only viable option at the moment unless we can fuse these layers during calibration...
Thank you for your work!

modelopt/torch/export/unified_export_hf.py

cjluo-nv · 2026-01-13T06:38:06Z

modelopt/torch/quantization/model_calib.py

+def svd(weight, rank):
+    original_device = weight.device
+    original_dtype = weight.dtype
+    weight_f64 = weight.to(dtype=torch.float64, device=original_device)


do we need f64?

I am not sure. I kept what @jingyu-ml has originally. This part is just a refactoring so that I can reuse this code during qkv fusion.

meenchen

LGTM overall

meenchen · 2026-01-13T17:54:37Z

modelopt/torch/export/quant_utils.py

-            for module in modules:
-                if not torch.equal(module.input_quantizer.pre_quant_scale, avg_prequant_scale):
-                    _update_pre_quant_scale(module, avg_prequant_scale)
+            if hasattr(modules[0].weight_quantizer, "svdquant_lora_a"):


Can svdquant_lora_a be None in any case?

Good point. I will skip it when It is None.

meenchen · 2026-01-13T18:19:04Z

Do we have unit tests for svd quant?

sychen52 · 2026-01-13T19:12:10Z

Do we have unit tests for svd quant?

I think we have unittest for svdquant, but not this export part.

Signed-off-by: Shiyang Chen <[email protected]>

coderabbitai · 2026-01-16T23:36:55Z

📝 Walkthrough

Walkthrough

This change introduces support for NVFP4 SVDQUANT (SVD-based quantization) throughout the modelopt export pipeline. It adds configuration options, defines a new quantization constant, extends quantization utilities to recognize and process SVD quantization, adds an SVD computation helper, and updates export logic to handle this quantization type consistently with existing variants.

Changes

Cohort / File(s)	Summary
Configuration & Example Scripts `examples/llm_ptq/hf_ptq.py`, `examples/llm_ptq/scripts/huggingface_example.sh`	Added nvfp4_svdquant to quantization configuration choices; maps to mtq.NVFP4_SVDQUANT_DEFAULT_CFG. Includes runtime guard raising NotImplementedError for multi-GPU export. Updated shell script to accept nvfp4_svdquant in qformat validation.
Export Model Configuration `modelopt/torch/export/model_config.py`	Introduced QUANTIZATION_NVFP4_SVDQUANT constant and updated hidden_size calculations to include this quantization type alongside QUANTIZATION_NVFP4 and QUANTIZATION_NVFP4_AWQ for both MOE and non-MOE branches.
Quantization Utilities & Post-processing `modelopt/torch/export/quant_utils.py`, `modelopt/torch/export/postprocess.py`	Extended quantization format detection and scaling factor retrieval to recognize SVDQUANT. Added _update_svdquant helper to recompute pre_quant_scale, lora weights, and rebuild quantizer statistics. Updated TP-merge logic to treat SVDQUANT equivalently to NVFP4_AWQ for weight scaling factor updates.
Unified Export `modelopt/torch/export/unified_export_hf.py`	Expanded conditional logic to include QUANTIZATION_NVFP4_SVDQUANT in pre-quant fusion, MoE expert processing, and weight export paths (transposition and quantization steps).
Core Quantization Implementation `modelopt/torch/quantization/model_calib.py`	Added svd(weight, rank) helper function that computes truncated SVD in double-precision and returns singular vectors. Refactored svdquant postprocess to use this helper and directly assign vt and us to svdquant_lora_a and svdquant_lora_b.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding support for exporting svdquant-quantized checkpoints to Hugging Face format. It is specific, clear, and directly related to the primary objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@examples/llm_ptq/hf_ptq.py`:
- Around line 510-513: Update the NotImplementedError message string in the
conditional that checks args.inference_tensor_parallel,
args.inference_pipeline_parallel and args.qformat ("nvfp4_svdquant") to fix the
typo: change "mulitple" to "multiple" so the raised message reads "Svdquant does
not support multiple GPUs yet." Refer to the conditional using
args.inference_tensor_parallel, args.inference_pipeline_parallel, args.qformat
and the raised NotImplementedError to locate the change.

♻️ Duplicate comments (1)

modelopt/torch/export/quant_utils.py (1)
498-500: Fix SVDQuant detection to check actual LoRA buffers (not just attribute presence).

hasattr(weight_quantizer, "svdquant_lora_a") is true even when the buffer is unset, so non‑SVD NVFP4 quantizers could be misclassified as SVDQUANT. Use a value check (and include svdquant_lora_b) to avoid false positives.
🛠️ Proposed fix
-            if input_quantizer is not None and hasattr(weight_quantizer, "svdquant_lora_a"):
-                return QUANTIZATION_NVFP4_SVDQUANT
+            if (
+                input_quantizer is not None
+                and getattr(weight_quantizer, "svdquant_lora_a", None) is not None
+                and getattr(weight_quantizer, "svdquant_lora_b", None) is not None
+            ):
+                return QUANTIZATION_NVFP4_SVDQUANT

coderabbitai · 2026-01-16T23:42:56Z

examples/llm_ptq/hf_ptq.py

+            if (
+                args.inference_tensor_parallel != 1 or args.inference_pipeline_parallel != 1
+            ) and args.qformat == "nvfp4_svdquant":
+                raise NotImplementedError("Svdquant does not support mulitple GPUs yet.")


⚠️ Potential issue | 🟡 Minor

Fix typo in error message.

"mulitple" should be "multiple".

✏️ Proposed fix

if ( args.inference_tensor_parallel != 1 or args.inference_pipeline_parallel != 1 ) and args.qformat == "nvfp4_svdquant": - raise NotImplementedError("Svdquant does not support mulitple GPUs yet.") + raise NotImplementedError("Svdquant does not support multiple GPUs yet.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (

args.inference_tensor_parallel != 1 or args.inference_pipeline_parallel != 1

) and args.qformat == "nvfp4_svdquant":

raise NotImplementedError("Svdquant does not support mulitple GPUs yet.")

if (

args.inference_tensor_parallel != 1 or args.inference_pipeline_parallel != 1

) and args.qformat == "nvfp4_svdquant":

raise NotImplementedError("Svdquant does not support multiple GPUs yet.")

🤖 Prompt for AI Agents

In `@examples/llm_ptq/hf_ptq.py` around lines 510 - 513, Update the NotImplementedError message string in the conditional that checks args.inference_tensor_parallel, args.inference_pipeline_parallel and args.qformat ("nvfp4_svdquant") to fix the typo: change "mulitple" to "multiple" so the raised message reads "Svdquant does not support multiple GPUs yet." Refer to the conditional using args.inference_tensor_parallel, args.inference_pipeline_parallel, args.qformat and the raised NotImplementedError to locate the change.

sychen52 requested review from a team as code owners January 9, 2026 05:16

sychen52 requested review from cjluo-nv, jingyu-ml and meenchen January 9, 2026 05:16

sychen52 force-pushed the svdquant branch 2 times, most recently from a788b53 to 34e75e5 Compare January 9, 2026 05:22

sychen52 force-pushed the svdquant branch from 34e75e5 to 2141906 Compare January 9, 2026 21:52

jingyu-ml approved these changes Jan 13, 2026

View reviewed changes

modelopt/torch/export/unified_export_hf.py Show resolved Hide resolved

cjluo-nv reviewed Jan 13, 2026

View reviewed changes

meenchen approved these changes Jan 13, 2026

View reviewed changes

cjluo-nv approved these changes Jan 13, 2026

View reviewed changes

sychen52 added 3 commits January 16, 2026 14:42

support nvfp4_svdquant hf export

6a3d899

Signed-off-by: Shiyang Chen <[email protected]>

handle q/k/v and gate/up merging for svdquant

dbc9df9

Signed-off-by: Shiyang Chen <[email protected]>

update based on review

dc02325

Signed-off-by: Shiyang Chen <[email protected]>

sychen52 force-pushed the svdquant branch from 2141906 to dc02325 Compare January 16, 2026 23:36

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

Svdquant huggingface checkpoint export support #754

Are you sure you want to change the base?

Svdquant huggingface checkpoint export support #754

Conversation

sychen52 commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jingyu-ml left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjluo-nv Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

sychen52 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

meenchen Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

sychen52 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen commented Jan 13, 2026

Uh oh!

sychen52 commented Jan 13, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sychen52 commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading